Evaluation of tools for long read RNA-seq splice-aware alignment

نویسندگان

  • Kresimir Krizanovic
  • Amina Echchiki
  • Julien Roux
  • Mile Sikic
چکیده

Motivation High-throughput sequencing has transformed the study of gene expression levels through RNA-seq, a technique that is now routinely used by various fields, such as genetic research or diagnostics. The advent of third generation sequencing technologies providing significantly longer reads opens up new possibilities. However, the high error rates common to these technologies set new bioinformatics challenges for the gapped alignment of reads to their genomic origin. In this study, we have explored how currently available RNA-seq splice-aware alignment tools cope with increased read lengths and error rates. All tested tools were initially developed for short NGS reads, but some have claimed support for long Pacific Biosciences (PacBio) or even Oxford Nanopore Technologies (ONT) MinION reads. Results The tools were tested on synthetic and real datasets from two technologies (PacBio and ONT MinION). Alignment quality and resource usage were compared across different aligners. The effect of error correction of long reads was explored, both using self-correction and correction with an external short reads dataset. A tool was developed for evaluating RNA-seq alignment results. This tool can be used to compare the alignment of simulated reads to their genomic origin, or to compare the alignment of real reads to a set of annotated transcripts. Our tests show that while some RNA-seq aligners were unable to cope with long error-prone reads, others produced overall good results. We further show that alignment accuracy can be improved using error-corrected reads. Availability and implementation https://github.com/kkrizanovic/RNAseqEval, https://figshare.com/projects/RNAseq_benchmark/24391. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of tools for long read RNA-seq splice-aware alignment Supplement

For the purpose of the test, error profiles of all real datasets were determined, by aligning them to the corresponding transcriptome using GraphMap (https://github.com/isovic/graphmap) and by running the script errorrates.py from https://github.com/isovic/samscripts. Due to low quality of alignment of ONT Minion RNA reads to the transcriptome, an additional dataset of ONT MinION R9 DNA reads w...

متن کامل

mTim: Rapid and accurate transcript reconstruction from RNA-Seq data

Motivation: Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology have revolutionized transcriptome studies. A major motivation for RNA-Seq is to map the structure of expressed transcripts at nucleotide resolution. With accurate computational tools for transcript reconstruction, this technology may also become useful for genome (re-)annotation, which has mostly relied on de no...

متن کامل

SPATA: A Seeding and Patching Algorithm for Hybrid Transcriptome Assembly

Transcriptome assembly from RNA-Seq reads is an active area of bioinformatics research. The ever-declining cost and the increasing depth of RNA-Seq have provided unprecedented opportunities to better identify expressed transcripts. However, the nonlinear transcript structures and the ultra-high throughput of RNA-Seq reads pose significant algorithmic and computational challenges to the existing...

متن کامل

Rna-sequencing Analysis: Read Alignment and Discovery and Reconstruction of fusion transcripts

Title of Document: RNA-SEQUENCING ANALYSIS: READ ALIGNMENT AND DISCOVERY AND RECONSTRUCTION OF FUSION TRANSCRIPTS Daehwan Kim, Doctor of Philosophy, 2013 Directed By: Professor Steven L. Salzberg, Department of Computer Science RNA-sequencing technologies, which sequence the RNA molecules being transcribed in cells, allow us to explore the process of transcription in exquisite detail. One of th...

متن کامل

MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery

The accurate mapping of reads that span splice junctions is a critical component of all analytic techniques that work with RNA-seq data. We introduce a second generation splice detection algorithm, MapSplice, whose focus is high sensitivity and specificity in the detection of splices as well as CPU and memory efficiency. MapSplice can be applied to both short (<75 bp) and long reads (≥ 75 bp). ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 34 5  شماره 

صفحات  -

تاریخ انتشار 2018